Sankhya: An Unbiased Benchmark for Bangla Handwritten Digits Recognition
Fuad Rahman ; Aminul Islam ; AKM Shahariar Azad Rabby
9-12th December 2019 - Open Science in Big Data (OSBD) Workshop for 2019 IEEE International Conference on Big Data in Los Angeles
Description
The rise of artificial intelligence technology along with machine and deep learning are opening up almost limitless possibilities. In recent years, application-based researchers in machine learning and deep learning have started developing solutions for many practical applications. Handwriting recognition is one such area of interest. Bangla, being the seventh most spoken language in the world, is not an exception. However, unlike English, there has not been concerted formal attempts in building a benchmark in comparing the different approaches reported in the literature, mainly because of lack of openly and freely available dataset and diversity of the approaches without formal comparative studies. In this research paper, we seek to rectify this gap. We have focused on benchmarking five robust algorithms: K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Random Forest (RF), Multi-layer Perceptron (MLP), Convolutional Neural Network (CNN), on all publicly available Bangla handwriting digits datasets, including Ekush, NumtaDB, CMARTdb, and BDRW. NumtaDB is also a collection of five handwriting dataset. In our study, we have compared this research with other state of the art research. In addition, we have worked on fine-tuning these algorithms by finding the best possible hyper-parameters of these algorithms. It is our hope that this Sankhya paper will work as a beginning point of an open and verifiable benchmarking process that we plan to repeat every two years for now on and set a standard for testing and validating newer and novel algorithms that will be reported in this area in the future.